[Livepatching][MVP] Extension changes to support livepatching in Canonical#341
[Livepatching][MVP] Extension changes to support livepatching in Canonical#341rane-rajasi wants to merge 8 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #341 +/- ##
==========================================
+ Coverage 93.82% 93.96% +0.14%
==========================================
Files 105 105
Lines 18172 18695 +523
==========================================
+ Hits 17049 17566 +517
- Misses 1123 1129 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds an initial (“MVP”) end-to-end flow for Canonical Livepatch support in the Linux patching extension, including config ingestion, eligibility checks via Ubuntu Pro, and status reporting into the patch installation summary.
Changes:
- Add Ubuntu Pro
pro statusquerying to detect whether Livepatch is enabled. - Add Apt-based livepatch workflow (set cutoff-date, restart livepatch daemon, fetch livepatch status, write to installation summary).
- Add livepatching settings ingestion in
ExecutionConfigand invoke livepatching fromPatchInstaller.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/src/package_managers/UbuntuProClient.py |
Adds pro status-based Livepatch enablement detection helpers. |
src/core/src/package_managers/AptitudePackageManager.py |
Adds livepatch execution + status parsing and reporting logic using canonical-livepatch. |
src/core/src/core_logic/PatchInstaller.py |
Triggers livepatching before the main installation flow when enabled via config. |
src/core/src/core_logic/ExecutionConfig.py |
Reads livepatching settings from disk and exposes enablement flags. |
src/core/src/bootstrap/Constants.py |
Adds livepatching settings path + config key constants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| is_livepatching_enabled = True | ||
| self.composite_logger.log_debug("Livepatching config values read from disk: [EnableLivePatching={0}] [EnabledBy={1}] [LastModified={2}]. Computed value of [IsLivePatchingEnabled={3}]" | ||
| .format(str(enable_livepatching), str(enabled_by), str(last_modified), str(is_livepatching_enabled))) | ||
| else: |
There was a problem hiding this comment.
The opposite of true isn't implicitly false. It can always be true, false or 'do nothing' (either because of bad data or because of no intent).
You're logging the computed intent but never logging the actual data read out.
There was a problem hiding this comment.
Added the actual value (read from file config) in the else block
| def __is_livepatching_enabled(self, livepatching_settings): | ||
| """ Determines if livepatching is enabled or disabled. """ |
There was a problem hiding this comment.
Ambiguous function naming and comment -- livepatching being enabled or not on the system is (separate) system state. Our goal is to capture the customer's intent on what they want us to do regardless of the system state. That goal wouldn't map 1:1 to this in naming given that separate state.
There was a problem hiding this comment.
I've changed livepatching to livepatch in all places except a few where using livepatching seemed more correct. Please review and let me know your thoughts
| return is_eula_accepted | ||
|
|
||
| def __get_livepatching_config_in_json(self): | ||
| """ Reads customer provided config on live patching from disk and returns a dict with the config values. |
There was a problem hiding this comment.
nit: consistency in phrasing. livepatching as the noun is livepatch. (all versions without a space)
Comment in general applies everywhere.
There was a problem hiding this comment.
Specifically use livepatch over livepatching if livepatch makes gramatical sense in the context as the Linux-wide usage of the phrase is 'livepatch' -- https://www.kernel.org/doc/html/latest/livepatch/livepatch.html
|
|
||
| #region Livepatching | ||
| def start_livepatching(self): | ||
| """ Applies livepatches on the machine, if it's pre-req are met""" |
|
|
||
| @staticmethod | ||
| def __reformat_date_for_livepatch(date_str): | ||
| """Converts AzGPS date format (20240401T000000Z) to ISO 8601 date string (2024-04-01T00:00:00Z).""" |
There was a problem hiding this comment.
Both are ISO 8601 date formats - one is basic and one is extended. Include in Envlayer->Datetime as generic.
| livepatch_fields = extracted_livepatch_fields[0] | ||
| check_state = livepatch_fields["CheckState"] | ||
| state = livepatch_fields["State"] | ||
| patch_name = "livepatch_" + check_state + "_" + state | ||
| patch_version = livepatch_fields["Version"] |
There was a problem hiding this comment.
All this needs to be documented in the spec.
There was a problem hiding this comment.
Please provide a range of end-user-facing examples
There was a problem hiding this comment.
Added a sample format with possible values for different fields both in code and in this PR's description: #341 (comment)
| ENABLE_LIVEPATCHING = 'EnableLivePatching' | ||
| LIVEPATCH_ONLY = 'LivePatchOnly' | ||
| ENABLED_BY = 'EnabledBy' | ||
| LAST_MODIFIED = 'LastModified' |
There was a problem hiding this comment.
Leave inline comments on the first two on exactly what they are supposed to mean
| enable_livepatching = self.__fetch_specific_setting(livepatching_settings, Constants.LivePatchingSettings.ENABLE_LIVEPATCHING) | ||
| enabled_by = self.__fetch_specific_setting(livepatching_settings, Constants.LivePatchingSettings.ENABLED_BY) | ||
| last_modified = self.__fetch_specific_setting(livepatching_settings, Constants.LivePatchingSettings.LAST_MODIFIED) | ||
| if enable_livepatching is not None and enable_livepatching in [True, 'True', 'true', '1', 1]: |
There was a problem hiding this comment.
what about TRUE? is there any way to do a case sensitive compare/cast to string then tolower the value before the compare?
There was a problem hiding this comment.
Made a code change to address this and the comment below on a helper function
|
|
||
| return is_livepatching_enabled | ||
|
|
||
| def __is_livepatch_only_enabled(self, livepatching_settings): |
There was a problem hiding this comment.
Can we standardize function names when it comes to using "livepatch" and "livepatching"?
There was a problem hiding this comment.
Done in most places, refer my other comment: #341 (comment)
| error_msg = "[APM] Command to set cutoff date in livepatch config failed. [Cmd={0}][Code={1}][Output={2}]".format(str(cmd), str(code), str(output)) | ||
| self.composite_logger.log_error(error_msg) | ||
| self.status_handler.add_error_to_status(error_msg, Constants.PatchOperationErrorCodes.LIVEPATCH_ERROR) | ||
| # Q: should we disable livepatching if we fail to set cutoff date in config since it's a critical config for livepatching to work properly? |
There was a problem hiding this comment.
@kjohn-msft, please share your thoughts on this one. I'm referring to the livepatch service on a VM. Our current expectation is that the customer needs to enable/activate it. This is one of the prereqs and is enforced in ubuntu_pro_client.is_livepatch_service_enabled_on_machine()
Livepatching is a service managed by the livepatch client, when this service is enabled, Livepatch client is responsible to apply livepatches during its scheduled run. When we set a cutoff-date, livepatch client honors it and applies only those patches that are available on or before the cutoff date. Without a cutoff-date, livepatch client will apply all patches available at the time it executes.
Ideally, we should not reset livepatch service, since we require the customer to enable it. But if this service is enabled and AzGPS is not able to set a cutoff-date, livepatching will continue on the VM at its preset cadence and we will never report any status for livepatching. This creates an information gap.
Imo, we LPE should disable livepatching if we fail to set a cutoff date and report it as an error in status blob. Let me know your thoughts
There was a problem hiding this comment.
This isn't a code question - this is a design-time workflow question. It's good to have a flow chart shared internally so that there's consensus on what the behaviors in various states are supposed to be. It'll help with faster turnaround in code.
There was a problem hiding this comment.
Will share the flow chart in the PR review request
|
|
||
| @staticmethod | ||
| def datetime_iso_basic_string_to_extended_string(datetime_string): | ||
| """Converts ISO 8601 basic date format (20240401T000000Z) to extended format (2024-04-01T00:00:00Z).""" |
There was a problem hiding this comment.
type information missing for new code. see 413 for example
| ENABLE_LIVEPATCH = 'EnableLivePatch' # boolean config that if set to true indicates customer's ask on AzGPS to apply livepatches on their VM | ||
| LIVEPATCH_ONLY = 'LivePatchOnly' # boolean config that indicates if the customer is requesting only livepatches i.e. no cold patch or regular patching with reboots |
There was a problem hiding this comment.
What happens if both are true?
There was a problem hiding this comment.
@kjohn-msft If both are true, LPE should perform only livepatching i.e. no regular patch installation.
We had discussed the possibility of having this in MVP, back when we discussed MVP implementation. I've added only the config read for now, not using or applyiing it anywhere. Let me know if livepatch_only code flow should be added in MVP or to remove this section
| self.is_livepatch_requested = self.__is_livepatch_requested(self.livepatch_customer_config_settings) | ||
| self.is_livepatch_only_requested = self.__is_livepatch_only_requested(self.livepatch_customer_config_settings) |
There was a problem hiding this comment.
Same as in the config section - what happens if both are true
There was a problem hiding this comment.
Why do we have this code here if it's not consumed?
| return livepatch_customer_config | ||
|
|
||
| def __is_livepatch_requested(self, livepatch_settings): | ||
| """ Determines if livepatch is requested in config settings. Returns a boolean.""" |
There was a problem hiding this comment.
All new functions require type information (in and out)
| def __fetch_specific_eula_setting(settings_source, setting_to_fetch): | ||
| """ Returns the specific setting value from eula_settings_source or None if not found """ | ||
| def __fetch_specific_setting(settings_source, setting_to_fetch): | ||
| """ Returns the specific setting value from the given settings_source or None if not found """ |
There was a problem hiding this comment.
Function type info here and elsewhere
| if self.are_livepatch_prereq_met(): | ||
| self.start_livepatching_on_machine() | ||
| else: | ||
| self.composite_logger.log_warning("[APM] Livepatches are not applied since the pre-requisites were not met") |
There was a problem hiding this comment.
Log is unclear on what was not met and how to remediate.
There was a problem hiding this comment.
Which pre-reqs were not met and what to do, if any, is mentioned in are_livepatch_prereq_met() wherever they fail
| These pre-reqs are: Machine should be attached to a pro subscription and livepatch service should be enabled on the VM. """ | ||
| self.composite_logger.log_debug("[APM] Checking if all the pre-reqs to receive livepatches are met. NOTE: Livepatches is only available on Ubuntu LTS paid pro VMs and has to be in enabled state") | ||
| if not self.ubuntu_pro_client.is_livepatching_applicable_for_machine(): | ||
| error_message = "[APM] Livepatching is not applicable for this machine, hence no livepatches will be installed" |
There was a problem hiding this comment.
It's unclear from a customer perspective what this actually means. This is superficially tied to an internal definition of 'applicable'.
| return False | ||
|
|
||
| if not self.ubuntu_pro_client.is_livepatch_service_enabled_on_machine(): | ||
| error_message = ("[APM] Livepatch service is not enabled on this machine, hence no livepatches will be installed." |
There was a problem hiding this comment.
The Ubuntu Pro client reported that the Livepatch service is not enabled. Please enable it for Livepatching to succeed.
| self.composite_logger.log_warning("[APM] A stale livepatch status may be reported since a manual launch/restart of the livepatch client failed") | ||
| self.fetch_and_update_livepatch_status_in_status_blob() |
There was a problem hiding this comment.
Why are we reporting a stale status? (at all)
There was a problem hiding this comment.
Livepatch status depends on when livepatching client last ran, so a stale status could still contain livepatch details, if the client ran before LPE could explicitly launch it. LPE explicitly launching livepatch client in the previous step is a best effort attempt to ensure livepatches have been attempted at least once on the VM
| self.composite_logger.log_warning("[APM] AzGPS will not apply livepatch on the VM since the livepatch cutoff date was not set. " | ||
| "Please check previous logs for more details on why it failed and fix the issue before trying to apply livepatches again") |
There was a problem hiding this comment.
This isn't readily actionable to the customer (or internally). Who didn't set the livepatch cutoff date?
Try to consolidate messages into fewer items but be clear on: what is wrong, and what to do. And leave comments inline on the expected outcome when someone gets this message.
There was a problem hiding this comment.
Modified the log
| error_msg = "[APM] Livepatch config update Exception: [Exception={0}]".format(repr(error)) | ||
| self.composite_logger.log_error(error_msg) | ||
| self.status_handler.add_error_to_status(error_msg, Constants.PatchOperationErrorCodes.LIVEPATCH_ERROR) | ||
| # Q: should we disable livepatching if we fail to set cutoff date in config since it's a critical config for livepatching to work properly? |
There was a problem hiding this comment.
Try to dig deeper into this question to arrive at actual proposals - what happens if you disable livepatching? What is the risk to the customer between the choices available? etc
There was a problem hiding this comment.
Livepatch client (once livepatch service is enabled) runs at its preset schedule, will fetch and apply all available livepatches during its run.
LPE by setting config date is ensuring only a certain set of patches are applied (available on or before config date). If we fail to set config date, livepatch client will continue to run and apply all patches available to the VM. This is not the livepatching with strict SDP scenario that LPE is committing to. Hence, it makes sense for LPE to disable livepatch service and report an error, so no new livepatches will be applied by the livepatch client.
However, since we require customer to enable livepatch service, LPE disabling it blurs the line between service and customer roles. LPE could simply go with the current implementation of not interfering with any customer action and simply reporting an error to notify the customer of this scenario
|
|
||
| def launch_livepatch_client(self): | ||
| """ Launch livepatch client manually as best case effort to ensure livepatches are applied in a timely manner. | ||
| If this fails, livepatches will still be applied but it will be up to the machine's cron to trigger it""" |
There was a problem hiding this comment.
"it will be up to the machine's cron to trigger it"
- In isolation, this is a yellow flag suggesting that the workflow needs to be reviewed closer separate from the code. What are guaranteed outcomes in the flow? What are bad outcomes? In the case of bad outcomes, what are the mitigation steps? When they fail, what are the clear messages given to the customer + actionable guidance given? [whole workflow]
There was a problem hiding this comment.
I think this covers everything: #341 (comment)
| return launch_successful | ||
|
|
||
| def fetch_and_update_livepatch_status_in_status_blob(self): | ||
| """Fetches livepatch status and if a livepatch/es is/are applied, updates it as a new patch entry in PatchInstallationSummary""" |
There was a problem hiding this comment.
It's unclear why this is in patchinstallationsummary (only?). What is the assessment-side story?
There was a problem hiding this comment.
This follows both our initial sync on MVP code flow/design and also how the livepatch client works. When a livepatch service is enabled and the livepatch client runs, it simply identifies and applies the patches in the same run. There is no way of only assessing which livepatches are available. Hence the logic and status report only in patchinstallationsummary
| else: | ||
| error_msg = "[APM] Failed to fetch livepatch status. [Cmd={0}][Code={1}][Output={2}]".format(str(cmd), str(code), str(output)) | ||
| self.composite_logger.log_error(error_msg) | ||
| self.status_handler.add_error_to_status(error_msg, Constants.PatchOperationErrorCodes.LIVEPATCH_ERROR) |
There was a problem hiding this comment.
Isn't this redundant over a controlled throw into the exception block with a message of your choice when the code is non-zero?
| patch_status = Constants.NOT_SELECTED | ||
| if state.lower() == "applied": | ||
| patch_status = Constants.INSTALLED |
There was a problem hiding this comment.
These are the only 2 states?
For the previously discussed workflow diagram, can you map what goes into each status write in various branches of the flow?
There was a problem hiding this comment.
Following the reasoning here: #341 (comment)
Livepatches are auto applied whenever found, there's no only assess scenario for livepatching. So, livepatches are either available and applied or not available. NOT_SELECTED was the closest status to denote patches are not available.
This will change with the complete Livepatch design, since it will report livepatchsummary as a separate substatus, outside of patchinstallationsummary. We'll have more control over what to fetch from livepatch client and report in LPE
|
|
||
| return extracted | ||
|
|
||
| def __reformat_date_for_livepatch(self, date_str): |
There was a problem hiding this comment.
This is not a try_reformat -- to silently fail and return empty string
| { | ||
| "Client-Version": "<>", | ||
| "Machine-Id": "<>", | ||
| "Architecture": "<>", | ||
| "CPU-Model": "<>", | ||
| "Last-Check": "<>", | ||
| "Boot-Time": "<>", | ||
| "Uptime": "<>", | ||
| "Status": [ | ||
| { | ||
| "Kernel": "<>", | ||
| "Running": true, | ||
| "Livepatch": { | ||
| "CheckState": "checked", | ||
| "State": "<>", // "nothing-to-apply" or "applied" | ||
| "Version": "" // "" or a version such as "1.0", | ||
| "Fixes": // empty if no livepatches available or a list of CVEs installed | ||
| [{ | ||
| "Name": "<>", //cve identifier such as CVE-000-0000 | ||
| "Description": "<>", // description of the livepatch fix | ||
| "Bug": "", | ||
| "Patched": <bool> // boolean value indicating status | ||
| }] | ||
| }, | ||
| "Supported": "<>", // "supported" or a quick text on what is needed such as "kernel-upgrade-required" | ||
| "UpgradeRequiredDate": "<>" // date | ||
| }], | ||
| "tier": "updates", | ||
| "Excluded-LSNs": [], // List of excluded LSNs | ||
| "Fixed-CVEs": { | ||
| "Timestamp": "", | ||
| "Kernel-Package-Fixes": [], // list of all kernel packages fixed | ||
| "Installed-Kernels": [], | ||
| "Patched-CVEs": [], // list of patched CVEs identifiers | ||
| "Digest": "" | ||
| }, | ||
| "Blocking-Options": [ // List of configs blocking livepatch, if any. For eg: cutoff-date set for livepatch client | ||
| "cutoff-date" | ||
| ], | ||
| "Using-Cutoff-Date": <bool> // boolean value indicating whether livepatch client is using cutoff-date config or not | ||
| } """ |
There was a problem hiding this comment.
Dump into a new suitably named file at src\tools\references
And reference that file here. We shouldn't bloat code where avoidable.
| for status_item in livepatch_status.get("Status", []): | ||
| if status_item.get("Running", False) == True and status_item.get("Supported", "unsupported").lower() == "supported": | ||
| livepatch = status_item.get("Livepatch", {}) | ||
| extracted.append({ | ||
| "CheckState": livepatch.get("CheckState", ""), | ||
| "State": livepatch.get("State", ""), | ||
| "Version": livepatch.get("Version", "") | ||
| }) | ||
| break |
|
|
||
| # region Livepatch | ||
| def is_livepatching_applicable_for_machine(self): | ||
| """ Verifies if livepatching is applicable for the machine by checking if the machine is an Ubuntu LTS Pro VM """ |
There was a problem hiding this comment.
Capable or supported is a lot more accurate than applicable in this context.
vm_supports_livepatching for e.g. is a lot more precise
if vm_supports_livepatching() : // consider the clarity here
or
pro_client_attached_for_livepatching
if pro_client_attached_for_livepatching(): // all the code is easier to read - it's precise and crisp without having to go to the function definition
And use this for the message on 181 too (which + 180 is the most useful part that's helped so far)
There was a problem hiding this comment.
Renamed to pro_client_attached_for_livepatching() and updated logs whereever applicable
| return True | ||
|
|
||
| def is_livepatch_service_enabled_on_machine(self): | ||
| """ Verifies if livepatch service is enabled on the machine """ |
There was a problem hiding this comment.
is_ is implicit
Separately: all functions that are new need to have type data
| except Exception as error: | ||
| ubuntu_pro_client_exception = repr(error) | ||
| self.composite_logger.log_debug("[APM][Pro] Ubuntu Pro Client status Exception: [Exception={0}]".format(ubuntu_pro_client_exception)) | ||
| self.composite_logger.log_warning("[APM][Pro] Failed to determine if livepatch service is enabled on the machine due to error while querying Ubuntu Pro Client status.") | ||
| return livepatch_service_enabled |
There was a problem hiding this comment.
It's not known in the exception case whether it's actually true or false. And the code returns false.
This is a detail that is hidden from the caller (and is implicitly tied together today).
There was a problem hiding this comment.
Appended the log with "____ AzGPS will consider the service to be disabled."
kjohn-msft
left a comment
There was a problem hiding this comment.
Please share a flow chart of what this code is expected to do offline (especially for each consequential failure state - the message and the remediation recommended at each failure leaf).
Other comments inline
| self.__teardown(runtime) | ||
|
|
||
| def __write_livepatch_settings_to_file(self, livepatch_settings): | ||
| f = open(Constants.AzGPSPaths.LIVEPATCH_CUSTOMER_SETTINGS, "w+") |
| substatus_file_data = self.__get_substatus_from_status_file()[0] | ||
| errors = json.loads(substatus_file_data["formattedMessage"]["message"])["errors"] | ||
| self.assertNotEqual(errors, None) | ||
| self.assertTrue("Livepatches will NOT be applied since the VM is not attached to a pro subscription." in str(errors)) |
| substatus_file_data = self.__get_substatus_from_status_file()[0] | ||
| updated_errors = json.loads(substatus_file_data["formattedMessage"]["message"])["errors"] | ||
| self.assertNotEqual(updated_errors, None) | ||
| self.assertTrue("The Ubuntu Pro client reported that the Livepatch service is not enabled." in str(updated_errors)) |
| substatus_file_data = self.__get_substatus_from_status_file()[0] | ||
| errors = json.loads(substatus_file_data["formattedMessage"]["message"])["errors"] | ||
| self.assertNotEqual(errors, None) | ||
| self.assertTrue("Exception while fetching livepatch status." in str(errors)) |
| substatus_file_data = self.__get_substatus_from_status_file()[0] | ||
| errors = json.loads(substatus_file_data["formattedMessage"]["message"])["errors"] | ||
| self.assertNotEqual(errors, None) | ||
| self.assertTrue("Exception while fetching livepatch status" in str(errors)) |
| self.assertEqual(substatus_file_data["name"], Constants.PATCH_INSTALLATION_SUMMARY) | ||
| patch = json.loads(substatus_file_data["formattedMessage"]["message"])["patches"][0] | ||
| self.assertEqual(patch["name"],"livepatch_checked_applied") | ||
| self.assertTrue("Other" in str(patch["classifications"])) |
| self.assertEqual(substatus_file_data["name"], Constants.PATCH_INSTALLATION_SUMMARY) | ||
| patch = json.loads(substatus_file_data["formattedMessage"]["message"])["patches"][0] | ||
| self.assertEqual(patch["name"], "livepatch_checked_nothing-to-apply") | ||
| self.assertTrue("Other" in str(patch["classifications"])) |
| self.assertEqual(substatus_file_data["name"], Constants.PATCH_INSTALLATION_SUMMARY) | ||
| patch = json.loads(substatus_file_data["formattedMessage"]["message"])["patches"][0] | ||
| self.assertEqual(patch["name"], "livepatch_checked_nothing-to-apply") | ||
| self.assertTrue("Other" in str(patch["classifications"])) |
| self.assertEqual(len(substatus_file_data), 1) | ||
| errors = json.loads(substatus_file_data[0]["formattedMessage"]["message"])["errors"] | ||
| self.assertNotEqual(errors, None) | ||
| self.assertTrue("The Ubuntu Pro client reported that the Livepatch service is not enabled" in str(errors)) |
| self.assertEqual(substatus_file_data["name"], Constants.PATCH_INSTALLATION_SUMMARY) | ||
| patch = json.loads(substatus_file_data["formattedMessage"]["message"])["patches"][0] | ||
| self.assertEqual(patch["name"], "livepatch_checked_nothing-to-apply") | ||
| self.assertTrue("Other" in str(patch["classifications"])) |

Livepatching in Canonical:
The apt package manager is responsible for installing .deb packages on Ubuntu LTS (long-term support) and interim releases, including the .deb package for the Linux kernel. Updating kernel packages require a system restart and leave the system vulnerable between the update and a restart. Canonical has offered livepatch as a solution to protect the system during this vulnerable period. Livepatch will modify the vulnerable kernel code in memory, and re-direct function calls to this version before a reboot.
We aim to leverage this livepatch solution in our product to offer customers the ability to have their VMs patches without needing frequent reboot.
Ubuntu has introduced a timestamp based livepatch rollout, which will restrict the livepatch being applied/installed to only the ones available on or before said timestamp
For eg:
$ canonical-livepatch config cutoff-date="2024-10-01T12:00:00Z"
NOTE: cutoff-date must be in the past
Once the cutoff-date is set, a Livepatch client, during its next check-in, will apply any livepatches available <= cutoff-date
Livepatch client: software running on a machine, that periodically checks for the availability of new patches. Once new patches are available, they are downloaded, verified, and applied to the current kernel.
Ref:
Live Linux kernel patching with progressive timestamped rollouts | Ubuntu
Livepatch Documentation | Ubuntu
AzGPS solution:
The solution implemented in this PR is an MVP to demonstrate how AzGPS will incorporate livepatching. In order to achieve MVP, we are adding livepatching as a part of Patch installation operation. i.e. Livepatching will only be performed during an Installation operation and livepatches will be applied before any regular patches are installed on the machine.
Customer pre-reqs:
- Feature flag: Customers need to opt-in to livepatching via an in-VM configuration. Only the customers opted in to livepatching will be considered.
- Livepatching is only available on Ubuntu LTS VMs with a paid pro subscription. Customer needs to attach a paid pro subscription and ensure 'livepatch' pro service is enabled on their VMs.
AzGPS reads the in-VM config/feature flag set by the customer.
Configure Patching and Assess Patches are completed without any changes.
During patch installation, if livepatching is requested, as a first step,
- AzGPS will ensure all other pre-reqs are met.
- Set cutoff date for canonical livepatch service using max_patch_publish_date
- Trigger Livepatch client (this is to ensure we fetch and report the latest livepatch status)
- Fetch and report livepatch status as an individual patch entry in 'PatchInstallationSummary' subsection of the status blob
- Continue with the regular patch installation
Reporting livepatch status in AzGPS:
Canonical provides the following livepatch status:
{
"Client-Version": "<>",
"Machine-Id": "<>",
"Architecture": "<>",
"CPU-Model": "<>",
"Last-Check": "<>",
"Boot-Time": "<>",
"Uptime": "<>",
"Status": [
{
"Kernel": "<>",
"Running": true,
"Livepatch": {
"CheckState": "checked",
"State": "<>", // "nothing-to-apply" or "applied"
"Version": "" // "" or a version such as "1.0",
"Fixes": // empty if no livepatches available or a list of CVEs installed
[{
"Name": "<>", //cve identifier such as CVE-000-0000
"Description": "<>", // description of the livepatch fix
"Bug": "",
"Patched": // boolean value indicating status
}]
},
"Supported": "<>", // "supported" or a quick text on what is needed such as "kernel-upgrade-required"
"UpgradeRequiredDate": "<>" // date
}],
"tier": "updates",
"Excluded-LSNs": [], // List of excluded LSNs
"Fixed-CVEs": {
"Timestamp": "",
"Kernel-Package-Fixes": [], // list of all kernel packages fixed
"Installed-Kernels": [],
"Patched-CVEs": [], // list of patched CVEs identifiers
"Digest": ""
},
"Blocking-Options": [ // List of configs blocking livepatch, if any. For eg: cutoff-date set for livepatch client
"cutoff-date"
],
"Using-Cutoff-Date": // boolean value indicating whether livepatch client is using cutoff-date config or not
}
From this livepatch status, we fetch "CheckState", "State" and "Version" from [Status][Livepatch] and add it as a single patch entry in PatchInstallationSummary. with patch_name: "livepatch_"_ and version:
For eg: A livepatch status added as a singular patch within PatchInstallationSummary:
{"patchId": "livepatch_checked_nothing-to-apply__Ubuntu_22.04", "name": "livepatch_checked_nothing-to-apply", "version": "", "classifications": ["Other"], "patchInstallationState": "NotSelected"}
Logs from in VM test runs:
Test case: Livepatching is not requested by the customer.
Expected output: No livepatches applied:
5.core.log
5.status.txt
Test case: Livepatching is requested by the customer and not enabled on the machine (i.e. all livepatching pre-reqs not enabled For eg: pro sub or setting livepatch state to enabled in pro services):
Expected output: No livepatches applied with or without error added in status
Test case 1: livepatch service not enabled:
Expected output: No livepatches applied, no error added in status:
11.core.log
11.status.txt
sample run 2:
3.core.log
3.status.txt
Test case 2: Pro NOT attached
Expected output: No livepatches applied, no error added in status
4.core.log
4.status.txt
Test case 3: Free pro sub attached:
Expected ouput: No livepatches applied, error added in status blob:
14.core.log
14.status.txt
sample run 2:
5.core.log
5.status.txt
Test case: Livepatching is requested by the customer and enabled on the VM (i.e. pre-reqs met), but no livepatches available:
Expected output: No new livepatches applied, livepatch status reported in status blob with patch name ~= 'livepatch_checked_nothing-to-apply____' and an empty patch version
10.core.log
10.status.txt
livepatching.settings.txt
sample run 2:
8.core.log
8.status.txt
Expected output: Livepatches applied, livepatch status reported in status blob with patch name ~= 'livepatch_checked_applied____' and non-empty patch version